Jasper Slingsby
No one ever does them…
…but they could save so much pain and suffering if they did!!!
Statistical power is the probability of a hypothesis test finding an effect if there is an effect to be found.
Power analysis is a calculation typically used to estimate the smallest sample size needed for an experiment, given a required significance level, statistical power, and effect size.
Firstly, it helps you plan your analyses before you’ve done your data collection, which is always useful.
Secondly, not knowing the statistical power of your analysis can result in
Type II Error:
Type I Error:
Type I and Type II Errors and how they result in false or missing findings, respectively. Image from Norton and Strube 2001, JOSPT.
Is determined by the combination of the:
We usually use an \(\alpha\) of 0.05 to indicate significant difference.
This is a subjective cut-off, but is generally accepted in the literature…
You have greater statistical power when you have greater differences in means (effect size). P1 vs P3 has greater power than either P1 vs P2 or P2 vs P3.
Greater variability among subjects results in larger standard deviations, reducing our ability to distinguish among groups (i.e. statistical power).
Increasing sample size increases statistical power by improving the estimate of the mean and constricting the distribution of the test statistic (i.e. reducing the standard error (SE)).
Simulate the data you would expect to collect in your study, varying the:
…and test for significant difference using the appropriate statistical test.
First, we need to simulate some data.
If we believe our data are normally distributed, we can use the handy rnorm() function, like so:
Now let’s look at our new data
We can plot it like so:
Tests the hypothesis that the mean of our population is a specific value (e.g. 0).
t.test(x = df$Data, # set our vector of data values
alternative = "two.sided", # specify the alternative hypothesis (which in this case is "not zero" so it is two-sided (verses "greater" or "less"))
mu = 0) # set the "true value" of the mean
One Sample t-test
data: df$Data
t = 7.4549, df = 49, p-value = 1.314e-09
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.6873307 1.1946444
sample estimates:
mean of x
0.9409876
In this case, the difference is highly significant! P < 0.000000000005!!!
What if we fiddle with the \(\alpha\) (“significance” level)?
but
Now let’s fiddle with the difference between group means (effect size).
In this case this is easiest done by shifting the mu to closer to the mean of our randomly generated data, like so
t.test(x = df$Data, # set our vector of data values
alternative = "two.sided", # specify the alternative hypothesis
mu = 0.5) # set the "true value" of the mean
One Sample t-test
data: df$Data
t = 3.4937, df = 49, p-value = 0.00102
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.6873307 1.1946444
sample estimates:
mean of x
0.9409876
Here we’ve reduced the effect size to from 1 to 0.5, but the result is still significantly different.
Now let’s fiddle with variability among subjects.
# Make new data with greater variability (standard deviation = 2)
df <- data.frame(Data =
rnorm(n = 50, # set the sample size
mean = 1, # set the mean
sd = 2), # set bigger standard deviation
Treatment = 1)
# Run t-test
t.test(x = df$Data,
alternative = "two.sided",
mu = 0.5)
One Sample t-test
data: df$Data
t = 1.1985, df = 49, p-value = 0.2365
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.2341031 1.5516756
sample estimates:
mean of x
0.8928893
With double the variability (standard deviation), and an effect size of 0.5, the result is no longer significantly different…
Now let’s increase the sample size.
# Make new data with greater sample size (n = 500)
df <- data.frame(Data =
rnorm(n = 100, # set the sample size
mean = 1, # set the mean
sd = 2), # set bigger standard deviation
Treatment = 1)
# Run t-test
t.test(x = df$Data,
alternative = "two.sided",
mu = 0.5)
One Sample t-test
data: df$Data
t = 2.2564, df = 99, p-value = 0.02624
alternative hypothesis: true mean is not equal to 0.5
95 percent confidence interval:
0.5510605 1.2954907
sample estimates:
mean of x
0.9232756
Aha! Greater
When you click the Render button a document will be generated that includes:
When you click the Render button a presentation will be generated that includes both content and the output of embedded code. You can embed code like this:
[1] 2